54 research outputs found
Uncertainty Estimation for 3D Dense Prediction via Cross-Point Embeddings
Dense prediction tasks are common for 3D point clouds, but the uncertainties inherent in massive points and their embeddings have long been ignored. In this work, we present CUE, a novel uncertainty estimation method for dense prediction tasks in 3D point clouds. Inspired by metric learning, the key idea of CUE is to explore cross-point embeddings upon a conventional 3D dense prediction pipeline. Specifically, CUE involves building a probabilistic embedding model and then enforcing metric alignments of massive points in the embedding space. We also propose CUE+, which enhances CUE by explicitly modeling cross-point dependencies in the covariance matrix. We demonstrate that both CUE and CUE+ are generic and effective for uncertainty estimation in 3D point clouds with two different tasks: (1) in 3D geometric feature learning we for the first time obtain well-calibrated uncertainty, and (2) in semantic segmentation we reduce uncertainty's Expected Calibration Error of the state-of-the-arts by 16.5%. All uncertainties are estimated without compromising predictive performance
STUN: Self-Teaching Uncertainty Estimation for Place Recognition
Place recognition is key to Simultaneous Localization and Mapping (SLAM) and
spatial perception. However, a place recognition in the wild often suffers from
erroneous predictions due to image variations, e.g., changing viewpoints and
street appearance. Integrating uncertainty estimation into the life cycle of
place recognition is a promising method to mitigate the impact of variations on
place recognition performance. However, existing uncertainty estimation
approaches in this vein are either computationally inefficient (e.g., Monte
Carlo dropout) or at the cost of dropped accuracy. This paper proposes STUN, a
self-teaching framework that learns to simultaneously predict the place and
estimate the prediction uncertainty given an input image. To this end, we first
train a teacher net using a standard metric learning pipeline to produce
embedding priors. Then, supervised by the pretrained teacher net, a student net
with an additional variance branch is trained to finetune the embedding priors
and estimate the uncertainty sample by sample. During the online inference
phase, we only use the student net to generate a place prediction in
conjunction with the uncertainty. When compared with place recognition systems
that are ignorant to the uncertainty, our framework features the uncertainty
estimation for free without sacrificing any prediction accuracy. Our
experimental results on the large-scale Pittsburgh30k dataset demonstrate that
STUN outperforms the state-of-the-art methods in both recognition accuracy and
the quality of uncertainty estimation.Comment: To appear at the 35th IEEE/RSJ International Conference on
Intelligent Robots and Systems (IROS2022
Nowhere to Hide: Cross-modal Identity Leakage between Biometrics and Devices
Along with the benefits of Internet of Things (IoT) come potential privacy risks, since billions of the connected devices are granted permission to track information about their users and communicate it to other parties over the Internet. Of particular interest to the adversary is the user identity which constantly plays an important role in launching attacks. While the exposure of a certain type of physical biometrics or device identity is extensively studied, the compound effect of leakage from both sides remains unknown in multi-modal sensing environments. In this work, we explore the feasibility of the compound identity leakage across cyber-physical spaces and unveil that co-located smart device IDs (e.g., smartphone MAC addresses) and physical biometrics (e.g., facial/vocal samples) are side channels to each other. It is demonstrated that our method is robust to various observation noise in the wild and an attacker can comprehensively profile victims in multi-dimension with nearly zero analysis effort. Two real-world experiments on different biometrics and device IDs show that the presented approach can compromise more than 70\% of device IDs and harvests multiple biometric clusters with ~94% purity at the same time
Risk Controlled Image Retrieval
Most image retrieval research focuses on improving predictive performance,
but they may fall short in scenarios where the reliability of the prediction is
crucial. Though uncertainty quantification can help by assessing uncertainty
for query and database images, this method can provide only a heuristic
estimate rather than an guarantee. To address these limitations, we present
Risk Controlled Image Retrieval (RCIR), which generates retrieval sets that are
guaranteed to contain the ground truth samples with a predefined probability.
RCIR can be easily plugged into any image retrieval method, agnostic to data
distribution and model selection. To the best of our knowledge, this is the
first work that provides coverage guarantees for image retrieval. The validity
and efficiency of RCIR is demonstrated on four real-world image retrieval
datasets, including the Stanford CAR-196 (Krause et al. 2013), CUB-200 (Wah et
al. 2011), the Pittsburgh dataset (Torii et al. 2013) and the ChestX-Det
dataset (Lian et al. 2021)
milliFlow: Scene Flow Estimation on mmWave Radar Point Cloud for Human Motion Sensing
Approaching the era of ubiquitous computing, human motion sensing plays a
crucial role in smart systems for decision making, user interaction, and
personalized services. Extensive research has been conducted on human tracking,
pose estimation, gesture recognition, and activity recognition, which are
predominantly based on cameras in traditional methods. However, the intrusive
nature of cameras limits their use in smart home applications. To address this,
mmWave radars have gained popularity due to their privacy-friendly features. In
this work, we propose \textit{milliFlow}, a novel deep learning method for
scene flow estimation as a complementary motion information for mmWave point
cloud, serving as an intermediate level of features and directly benefiting
downstream human motion sensing tasks. Experimental results demonstrate the
superior performance of our method with an average 3D endpoint error of 4.6cm,
significantly surpassing the competing approaches. Furthermore, by
incorporating scene flow information, we achieve remarkable improvements in
human activity recognition, human parsing, and human body part tracking. To
foster further research in this area, we provide our codebase and dataset for
open access.Comment: 15 pages, 8 figure
- …